***Adding measurement error to location data to protect subject confidentiality while allowing for consistent estimation of exposure effects
***The Journal of the Royal Statistical Society: Series C (Applied Statistics)

***REPLICATION DO FILE 2: SIMULATION WITH DEMONSTRATION OF ERROR CORRECTION

***Mahesh Karra, David Canning, Ryoko Sato
***February 29, 2020

version 13

***STEP 0: STORE WORKING FILES AND SETTING WORKING DIRECTORY

***Make sure that all do files and data files that are provided with this replication package are stored in the same folder.

***Users will need to tailor the file directory pathway to the specific filepath where the replication package is stored.
***To use the correct file directory, replace "maindir" with the appropriate file path under which your Dropbox folder is stored.

if "`c(username)'"=="mvkarra" {
                global maindir "C:\Users\mvkarra\Dropbox\Harvard 2015-2016\POP CENTER\Distance Paper - Part 2\DOCUMENTS\JOURNAL SUBMISSIONS\JRSS SERIES C\Replication Files"
}
else if "`c(username)'"=="[another username here]" {
                global maindir "[same as above]"
}
 
***USE WOMAN AND HF GPS DATA: DATASET 1
use "$maindir\DATA_1_WOMAN_GPS_2_29_20.dta", clear

***SET SEED
set seed 22071987

********************************************************************************
***SIMULATION EXERCISE
********************************************************************************

***[1] Create (1) perturbed distance from actual distance (dist5_e`s') (2) the fixed distance from the perturbed distance (log: E_min_dg_i_`j')

***NUMBER OF SIMULATIONS: 1
***For the demonstration, we have set s, the number of replications, to 1. For the JRSS paper, we conducted 1,000 replications for the results.
local s = 1


***Sort woman data by Woman ID
destring womanid, gen(wid)
format wid %20.0g
sort wid

***SIMULATION WITH s REPLICATIONS
while `s'<=1 {

***Create random angle, theta, in 0-360
gen theta = runiform(0,2*_pi)

***Create random distance, m5, with a uniform distribution (error = 5 km)
gen m5 = runiform(0,5)

***We need to convert the angle and distance (km) to latitude/longitude points
***We refer to http://www.longitudestore.com/how-big-is-one-gps-degree.html

gen errorx5 = sin(theta)*m5/110.57
gen errory5 = cos(theta)*m5/111.03

gen x_new5 = longi + errorx5
gen y_new5 = lati + errory5

drop theta m5
drop errorx5 errory5

***Measure the distance from the fake HH location (with 5 km error) to the real HF location
globdist dist5_e`s' , lat0(y_new5) lon0(x_new5) latvar(latitude) lonvar(longitude)

*Append GRID info
append using "$maindir\DATA_2_ARUSHA_GRID_GPS_2_29_20.dta"


***STEP 1: FOR EACH GRID POINT, WE CALCULATE DISTANCE TO OBSERVED CLUSTER POINT y_ip, DENOTED p_mx_i
***This is the distance from each grid to the fake (perturbed) HH location 

sort wid
gen id=_n if _n<=820

sort id
local l = 1 
while `l' <= 820 {
local j = y_new5[`l']
local k = x_new5[`l']
globdist distc`l' , lat0(`j') lon0(`k') latvar(grid_lati) lonvar(grid_longi)
local l=`l'+1
}

***STEP 2: GENERATE prob_mx_i, WHICH IS 0 IF p_mx_i > 5 AND 1/(10pi*p_mx_i) IF p_mx_i <= 5
***For each grid, we get the number (create new var (number of HH) times) 

sort id
forvalues j = 1/820 {
gen prob_mx_i_`j' = 0 if _n>820 
replace prob_mx_i_`j' = popdens2015*(1 / (10*_pi*distc`j')) if distc`j'<= 5
egen sum_prob_mx_i_`j' = sum(prob_mx_i_`j') if _n>820
}

***hfid = Health Facility ID
gen hfid = census_id
move hfid dist_hffp

***gridid = Grid ID
gen gridid=_n-820 if _n>=821
move gridid id

***
***GENERATE MINIMUM DISTANCE FROM THE GRID POINT TO EACH HF
merge m:1 grid_longi grid_lati using "$maindir\DATA_3_ERROR_FIX_HF_2_29_20.dta"

sort wid
sort wid hfid gridid

***STEP 3: NUMERICAL INTEGRATION
***See the simplified equation (12) in the paper in page 1258
forvalues j = 1/820 {
gen exp_min_dg_i`j' = distcHF`j' * (prob_mx_i_`j' / sum_prob_mx_i_`j')
egen E_min_dg_i_`j' = sum(exp_min_dg_i`j')
}

gen E_min_dy`s' = .
local j = 1
while `j'<=820 {
replace E_min_dy`s' = E_min_dg_i_`j'[1] if id==`j'
local j =`j'+1
}

drop distc1- distc820 prob_mx_i_1- sum_prob_mx_i_820 distcHF1-distcHF820 exp_min_dg_i1- E_min_dg_i_820 x_new5 y_new5 id hfid gridid _merge
drop if grid_longi~=. & grid_lati~=. 

local s =`s'+1
}


keep if womanid~=""


***[2] Run comparisons of subjective distance on actual distance

***Generate the log of subjective distance
gen lnkm_hf = log(km_hf) 

***Generate the log of actual distance
gen lndist_hffp = log(dist_hffp)

***REGRESSION: Subjective distance on actual distance
reg lnkm_hf lndist_hffp


***[3] Run the regression of subjective distance on the perturbed distance (1 replication), generating random perturbation < 5km for each replication)

***Generae the log of perturbed distance

***NUMBER OF SIMULATIONS: 1
local j = 1

while `j'<= 1 {
gen lndist5_e`j'=log(dist5_e`j')
local j =`j'+1
}

reg lnkm_hf lndist5_e1


*[4] Run the subjective distance on corrected distance
reg lnkm_hf E_min_dy1
